Find Datasets

If there is one sentence, which summarizes the essence of learning data science, it is this:

The best way to learn data science is to apply data science.

If you are a beginner, you improve tremendously with each new project you undertake. If you are an experienced data science professional, you already know what I am talking about.

However, when I give this advice to people, they usually ask something in return – Where can I get datasets for practice? They don’t realize the amount of data sets available in open. They fail to realize the amount of learning they can get out from working on these projects to get a boost in their career.

If you think that the situation above applies to you – Don’t worry! you are just at the right place. This article will provide you a list of websites / resources from which you can use data to do your own (pet) projects or even create your own products.

 

How can you use these sources?

There is no end to how you can use these data sources. The application and usage is only limited by your creativity and application.

The simplest way to use them is to create data stories and publishing them over web. This would not only improve your data and visualization skills, but also improve your structured thinking.

On the other hand, if you are thinking / working on a data based product, these datasets could add power to your product by providing additional / new input data.

 So, go ahead, work on these projects and share them with the larger world to showcase your data prowess!

I have divided these sources in various sections to help you categorize data sources based on application. We start with simple, generic and easy to handle datasets and then move to huge / industry relevant datasets. We then provide links to dataset for specific purpose – Text Mining, Image classification, Recommendation engine etc. This should provide you a holistic list of data resources.

If you can think of any application of these datasets or know of any popular resources which I have missed, please feel free to share them with me in the comments below.

 

Simple & Generic datasets to get you started

 

 

 

 

Huge Datasets – things are getting serious now!

 

 

 

Datasets for predictive modeling & machine learning:

 

 

 

 

 

 

Image classification datasets

 

 

 

 

Text Classification datasets

 

 

 

Datasets for Recommendation Engine

 

 

Websites which Curate list of datasets from various sources: